Name Admin No Class Assignment
Goh Rui Zhuo 2222329 DAAA/2B/05 Deep Learning CA1

Part A: Convolutional Neural Network¶

__Problem Statement and Background Research__¶


The goal of this part is to implement an image classifier using deep learning network:

  • Dataset contains 15 types of vegetables
  • Colour must be converted into grayscale
  • Two Models must be developed
    1. 31 x 31
    2. 128 x 128

Build two models and compare the accuracies for each ones

What is image classification ? Image classification stands out with its irreplaceable role in modern technology. It involves assigning a label or tag to an entire image based on preexisting training data of already labeled image

__References__¶


  1. https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
  2. https://towardsdatascience.com/everything-you-need-to-know-about-activation-functions-in-deep-learning-models-84ba9f82c253

__Import Libraries__¶


Here is to import all the required libraries

In [316]:
!pip install tensorflow_addons keras-tuner pandas matplotlib seaborn scikit-learn tqdm
Requirement already satisfied: tensorflow_addons in /usr/local/lib/python3.11/dist-packages (0.22.0)
Requirement already satisfied: keras-tuner in /usr/local/lib/python3.11/dist-packages (1.4.6)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.1.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.8.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.0)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.3.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (4.66.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow_addons) (23.1)
Requirement already satisfied: typeguard<3.0.0,>=2.7 in /usr/local/lib/python3.11/dist-packages (from tensorflow_addons) (2.13.3)
Requirement already satisfied: keras in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (2.14.0)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (2.31.0)
Requirement already satisfied: kt-legacy in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (1.0.5)
Requirement already satisfied: numpy<2,>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (1.26.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2023.3)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.44.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib) (2.4.7)
Requirement already satisfied: scipy>=1.5.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.11.3)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.2.0)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2.0.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2023.7.22)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python3 -m pip install --upgrade pip

Other Imports¶

In [317]:
import numpy as np
import pandas as pd

import seaborn as sns
from matplotlib import pyplot as plt

from sklearn.metrics import classification_report,accuracy_score, confusion_matrix
from sklearn.decomposition import PCA
from sklearn.preprocessing import Normalizer
from sklearn.metrics import classification_report, accuracy_score

import os, time, math, datetime, warnings, pytz, glob
from IPython.display import display
from functools import reduce
import absl.logging
from tqdm import tqdm
import logging

absl.logging.set_verbosity(absl.logging.ERROR)
logging.getLogger('tensorflow').disabled = True
warnings.filterwarnings('ignore')

Tensorflow Import¶

In [318]:
import tensorflow as tf

from tensorflow.keras.utils import Sequence, to_categorical
from tensorflow import expand_dims
from tensorflow.keras import Sequential
from tensorflow.keras import layers as L
from tensorflow.keras import backend as K
from tensorflow.image import random_flip_left_right, random_crop, resize_with_crop_or_pad
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import (Dense, Input, InputLayer, Normalization, Flatten,BatchNormalization,
    Dropout,Conv2D, GlobalAveragePooling2D, MaxPooling2D, ReLU, Layer,Activation, Multiply, AveragePooling2D,
    Add, RandomRotation,Resizing, Rescaling, Reshape, Concatenate, concatenate, Lambda,LeakyReLU)
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau, TerminateOnNaN, TensorBoard, CSVLogger, Callback
from tensorflow.keras.backend import clear_session
from tensorflow.keras.optimizers import RMSprop, SGD, Adam, Adagrad, Adamax
from tensorflow.keras.regularizers import l2, L2
from tensorflow.keras.optimizers.schedules import *
from tensorflow.keras.metrics import FalseNegatives, categorical_crossentropy, sparse_categorical_crossentropy
from tensorflow.keras.regularizers import l2
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.image import *
from tensorflow_addons.optimizers import SWA


from kerastuner.tuners import Hyperband
from kerastuner import HyperModel
# Setting a seaborn style
sns.set(style="whitegrid")

Set the seed of this notebook¶

In [319]:
seed = 32
tf.random.set_seed(seed)
np.random.seed(seed)

__Check for GPU__¶


Here is check the available GPUs and set the memory growth

In [320]:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)

        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(f"{len(gpus)} Physical GPUs, {len(logical_gpus)} Logical GPU")
    except RuntimeError as e:
        print(e)
1 Physical GPUs, 1 Logical GPU
In [321]:
!nvidia-smi
Sun Nov 12 09:32:37 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
|  0%   28C    P8              27W / 305W |   1296MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

__Import dataset__¶


Here is to import the dataset and proceed to do Exxploratory Data Analysis

Train Data¶

In [322]:
data = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/train'  ,
                                                   color_mode='rgb',
                                                   image_size=(224,224))
data
Found 9028 files belonging to 15 classes.
Out[322]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
Things Observed
    - From the above dataset, we can see that the iamges contains 9028 files with 15 classes<

Validation Data¶

In [323]:
val_data = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/validation'  ,
                                                       color_mode = 'rgb',
                                                       image_size=(224,224) )
val_data
Found 3000 files belonging to 15 classes.
Out[323]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
Things Observed
    - From the above dataset, we can see that the iamges contains 3000 files with 15 classes

Test Data¶

In [324]:
test_data = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/test'  ,
                                                       color_mode = 'rgb',
                                                       image_size=(224,224) )
test_data
Found 3000 files belonging to 15 classes.
Out[324]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
Things Observed
    - From the above dataset, we can see that the iamges contains 3000 files with 15 classes

__Setting the Data__¶


Here we will set the data to X and y for ease of feature engineering later on

Train Data¶

In [325]:
X_train = []
y_train = []

# This gets all the images data into the array
for images, labels in tqdm(data):
    X_train.append(images)
    y_train.append(labels)

X_train = np.concatenate(X_train, axis=0)
y_train = np.concatenate(y_train, axis=0)
100%|██████████| 283/283 [00:01<00:00, 152.02it/s]

Checking the data shapes¶

In [326]:
(X_train.shape, y_train.shape)
Out[326]:
((9028, 224, 224, 3), (9028,))

Validation Data¶

In [327]:
X_val = []
y_val = []

# This gets all the images data into the array
for images, labels in tqdm(val_data):
    X_val.append(images)
    y_val.append(labels)

X_val = np.concatenate(X_val, axis=0)
y_val = np.concatenate(y_val, axis=0)
100%|██████████| 94/94 [00:00<00:00, 176.33it/s]

Checking the data shapes¶

In [328]:
(X_val.shape, y_val.shape)
Out[328]:
((3000, 224, 224, 3), (3000,))

Test Data¶

In [329]:
X_test = []
y_test = []

# This gets all the images data into the array
for images, labels in tqdm(test_data):
    X_test.append(images)
    y_test.append(labels)

X_test = np.concatenate(X_test, axis=0)
y_test = np.concatenate(y_test, axis=0)
100%|██████████| 94/94 [00:00<00:00, 152.55it/s]

Checking the data shapes¶

In [330]:
(X_test.shape, y_test.shape)
Out[330]:
((3000, 224, 224, 3), (3000,))
Things Observed
    - Data is split into X and y for train, validation and test succesfully

__Exploratory Data Analysis (Original Image)__¶


Here is to import the dataset and proceed to do analysis on it

Set the labels for the Classes¶

This is so that we are able to use it for EDA and model training later on

In [331]:
labels_dict = os.listdir('Dataset for CA1 part A/train')
labels_dict = {idx: label for idx, label in enumerate(labels_dict)}
print(labels_dict)
{0: 'Bean', 1: 'Bitter_Gourd', 2: 'Bottle_Gourd', 3: 'Brinjal', 4: 'Broccoli', 5: 'Cabbage', 6: 'Capsicum', 7: 'Carrot', 8: 'Cauliflower', 9: 'Cucumber', 10: 'Papaya', 11: 'Potato', 12: 'Pumpkin', 13: 'Radish', 14: 'Tomato'}

Ckass Distribution¶

The idea here is to identify class imbalance throughout the dataset as imbalance dataset may cause poor performance for the class with less representation, impacting overall performance

In [332]:
def get_classes_distribution(data):
    label_counts = {}
    total_samples = 0

    # Looping through the dataset
    for batch in tqdm(data):
        labels = batch[1]
        # Splitting into the unique labels and counts
        unique_labels, counts = tf.unique(labels)
        for label, count in zip(unique_labels.numpy(), counts.numpy()):
            # If else to check whether label in label counts dictionary
            if label not in label_counts:
                label_counts[label] = count
            else:
                label_counts[label] += count

        total_samples += len(labels)
    label_counts = {vegetable: label_counts[index] for index, vegetable in labels_dict.items()}
    # TO get the percentage here
    for label, count in tqdm(label_counts.items()):
        percent = (count / total_samples) * 100
        print("Label {} contains: {} samples, {}%".format(label, count, percent))

    return label_counts

Displaying the result¶

In [333]:
label_count = get_classes_distribution(data)
100%|██████████| 283/283 [00:01<00:00, 180.36it/s]
100%|██████████| 15/15 [00:00<00:00, 42452.47it/s]
Label Bean contains: 858 samples, 9.50376606114311%
Label Bitter_Gourd contains: 917 samples, 10.15728843597696%
Label Bottle_Gourd contains: 797 samples, 8.828090385467435%
Label Brinjal contains: 786 samples, 8.706247230837395%
Label Broccoli contains: 846 samples, 9.370846256092157%
Label Cabbage contains: 899 samples, 9.95790872840053%
Label Capsicum contains: 805 samples, 8.916703588834736%
Label Carrot contains: 788 samples, 8.72840053167922%
Label Cauliflower contains: 777 samples, 8.60655737704918%
Label Cucumber contains: 968 samples, 10.72219760744351%
Label Papaya contains: 912 samples, 10.101905183872397%
Label Potato contains: 834 samples, 9.237926451041204%
Label Pumpkin contains: 900 samples, 9.968985378821445%
Label Radish contains: 647 samples, 7.166592822330527%
Label Tomato contains: 803 samples, 8.894550287992912%

In [334]:
def plot_dist(count):
    # Unpack the vegetable names and counts
    labels, counts = zip(*count.items())
    
    fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    fig.suptitle('Vegetable Class Labels Visualization', fontsize=16, fontweight='bold')
    bars = ax.barh(labels, counts, color=sns.color_palette("viridis", len(labels)))
     # Set the label
    ax.set_xlabel('Total Count', fontsize=12)
    ax.set_ylabel('Vegetable Names', fontsize=12)

    # add the count here
    for bar in bars:
        width = bar.get_width()
        label_x_pos = width + max(counts) * 0.01 
        ax.text(label_x_pos, bar.get_y() + bar.get_height()/2, f'{width}', va='center')

    plt.show()
In [335]:
plot_dist(label_count)
Things Observed
    - Dataset has Cauliflower as the highest representation at 628 - Dataset has Radish as the lowest representation at 922 - Some form of imbalance dataset can be seen

Random Image Visualisation¶

Process of picking random imjages and visualise it

  • This provide us a way to inspect our dataset
  • To ensure that images are of good qualidty
In [336]:
def random_image_visualization(X_data, y_data, labels_dict, num_images=10, new=False, size=None):

    # Get a random index here
    random_indices = np.random.choice(X_data.shape[0], num_images, replace=False)
    # Here is to calculate the number of rows
    rows = num_images // 5 + int(num_images % 5 != 0)
    cols = min(num_images, 5)
    fig, axes = plt.subplots(rows, cols, figsize=(15, 3*rows))

    axes = axes.flatten() if rows > 1 else [axes]

    if new:
        fig.suptitle(f'Random Image Visualisation {size}', fontsize=16, fontweight='bold')
        for idx, ax in enumerate(axes):
            if idx < num_images:
                ax.imshow(X_data[random_indices[idx]], cmap='gray')
                ax.set_title(f"Label: {labels_dict[y_data[random_indices[idx]]]}", fontsize=12)
                ax.axis('off')
            else:
                ax.axis('off') 
    else:
        fig.suptitle(f'Random Image Visualisation ', fontsize=16, fontweight='bold')
        for idx, ax in enumerate(axes):
            if idx < num_images:
                ax.imshow(X_data[random_indices[idx]].astype('uint8'))
                ax.set_title(f"Label: {labels_dict[y_data[random_indices[idx]]]}", fontsize=12)
                ax.axis('off')
            else:
                ax.axis('off') 
    plt.show()
In [337]:
random_image_visualization(X_train, y_train, labels_dict, new=False)
Things Observed
    - Dataset are of pretty good quality

35 Image visualisation¶

Simlar to above random visualisation, this is to

  • Ensure the images are of good qualiy
  • Check first few images here
In [338]:
def visualize_first_images(X_data, y_data, labels, num_images=35, new = False, size=None):
    num_rows = num_images // 7 + (num_images % 7 != 0)
    fig = plt.figure(figsize=(15, 2 * num_rows))

    if new:
        fig.suptitle(f'First 35 Images {size}', fontsize=16, fontweight='bold')
        for i in range(num_images):
            ax = fig.add_subplot(num_rows, 7, i+1)
            ax.imshow(X_data[i], cmap='gray')
            ax.set_title(labels[int(y_data[i])], fontsize=12)
            ax.axis('off')
    
    else:
        fig.suptitle(f'First 35 Images ', fontsize=16, fontweight='bold')
        for i in range(num_images):
            ax = fig.add_subplot(num_rows, 7, i+1)
            ax.imshow(X_data[i].astype('uint8'))
            ax.set_title(labels[int(y_data[i])], fontsize=12)
            ax.axis('off')
        
    plt.show()

visualize_first_images(X_train, y_train, labels_dict)
Things Observed
    - Similar to above, dataset are of pretty good quality

Pixel Distribution¶

To determine which methods to use for scaling of dataset

  1. Check for the distribution of the first to iamges
In [339]:
def visualise_pixel_distribution(X_data, index1, index2):
    fig, ax = plt.subplots(2, 2, figsize=(15, 8))
    fig.suptitle('Pixel Distribution', fontsize=15, fontweight='bold')
    ax[0, 0].imshow(X_data[index1].astype(np.uint8)) # First images
    ax[0, 0].axis('off')
    sns.histplot(X_data[index1].flatten(), ax=ax[1, 0], kde=True, color='blue')
    ax[0, 1].imshow(X_data[index2].astype(np.uint8)) # Second images
    ax[0, 1].axis('off')
    sns.histplot(X_data[index2].flatten(), ax=ax[1, 1], kde=True, color='green')

    plt.show()

visualise_pixel_distribution(X_train, index1=0, index2=1)
Things Observed
    - First Images - Significant variance in colour intensity - Contains peaks at 150 - Wide variation of colour intensity - Second Images - Smaller number of green beans on a light background. - Histogram for this image shows a significant peak around the 200 intensity level, which corresponds to the light background

Mean Pixel Distribution¶

Checking the mean pixke and standard deviation

In [340]:
mean, std = np.mean(X_train) ,  np.std(X_train)
print('Mean of Images:', mean)
print('Standard Deviation of Images:', mean)
Mean of Images: 106.97684
Standard Deviation of Images: 106.97684

Whole Dataset¶

In [341]:
def average_img(X_data):
    average_image = np.mean(X_data.astype(np.uint8), axis=0) / 255
    fig, ax = plt.subplots(figsize=(5, 5))
    ax.imshow(average_image)
    ax.set_title('Average Image', fontsize=15, fontweight='bold')
    ax.axis('off')
    plt.show()
In [342]:
average_img(X_train)

By Classs¶

In [343]:
def mean_class(X_data):
    fig, ax = plt.subplots(2, 5, figsize=(13, 8))
    fig.suptitle('Mean Pixel By Class', fontsize=15, fontweight='bold')
    for index, axs in enumerate(ax.ravel()):
      average = np.mean(X_data[np.squeeze(y_train == index)],axis=0)
      axs.imshow(average.astype(np.uint8))
      axs.set_title(f'Label: {labels_dict[index]}')
      axs.axis('off')
    plt.show()
In [344]:
mean_class(X_train)
Things Observed
    - Overall, the mean image is not as clear here

__Exploratory Data Analysis (31 x 31 Image)__¶


Here is to import the dataset and proceed to do analysis on it

Importing of data¶

  1. 31 x 31 Images was imported again
  2. Conversion of grayscale was done
In [345]:
data_small = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/train'  ,
                                                   color_mode='rgb',
                                               image_size=(31,31))
data_small
Found 9028 files belonging to 15 classes.
Out[345]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 31, 31, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [346]:
X_train_small = []
y_train_small = []


for images, labels in tqdm(data_small):
    images = tf.image.rgb_to_grayscale(images)
    X_train_small.append(images)
    y_train_small.append(labels)

X_train_small = np.concatenate(X_train_small, axis=0)
X_train_small = np.squeeze(X_train_small, axis=-1)
y_train_small = np.concatenate(y_train_small, axis=0)
100%|██████████| 283/283 [00:01<00:00, 179.66it/s]
In [347]:
val_data_small = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/validation'  ,
                                                   color_mode='rgb',
                                               image_size=(31,31))
val_data_small
Found 3000 files belonging to 15 classes.
Out[347]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 31, 31, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [348]:
X_val_small = []
y_val_small = []


for images, labels in tqdm(val_data_small):
    images = tf.image.rgb_to_grayscale(images)
    X_val_small.append(images)
    y_val_small.append(labels)

X_val_small = np.concatenate(X_val_small, axis=0)
X_val_small = np.squeeze(X_val_small, axis=-1)
y_val_small = np.concatenate(y_val_small, axis=0)
100%|██████████| 94/94 [00:00<00:00, 239.20it/s]
In [349]:
test_data_small = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/test'  ,
                                                   color_mode='rgb',
                                               image_size=(31,31))
test_data_small
Found 3000 files belonging to 15 classes.
Out[349]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 31, 31, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [350]:
X_test_small = []
y_test_small = []


for images, labels in tqdm(test_data_small):
    images = tf.image.rgb_to_grayscale(images)
    X_test_small.append(images)
    y_test_small.append(labels)

X_test_small = np.concatenate(X_test_small, axis=0)
X_test_small = np.squeeze(X_test_small, axis=-1)
y_test_small = np.concatenate(y_test_small, axis=0)
100%|██████████| 94/94 [00:00<00:00, 260.74it/s]

Random Image Visualisation¶

  • Done on the smaller dataset
In [351]:
random_image_visualization(X_train_small, y_train_small, labels_dict, new=True, size='(31 x 31 Images)')

35 Image visualisation¶

In [352]:
visualize_first_images(X_train_small, y_train_small, labels_dict, new=True, size='(31 x 31 Images)')
Things Observed
    - Overall, smaller 31 x31 dataset are of pretty good equality

Pixel Distribution¶

  • New function was define for this
In [353]:
def visualise_pixel_distribution2(X_data, index1, index2):
    fig, ax = plt.subplots(2, 2, figsize=(15, 8))
    fig.suptitle('Pixel Distribution', fontsize=15, fontweight='bold')
    ax[0, 0].imshow(X_data[index1], cmap='gray') # First images
    ax[0, 0].axis('off')
    sns.histplot(X_data[index1].flatten(), ax=ax[1, 0], kde=True, color='blue')
    ax[0, 1].imshow(X_data[index2], cmap='gray') # Second images
    ax[0, 1].axis('off')
    sns.histplot(X_data[index2].flatten(), ax=ax[1, 1], kde=True, color='green')
    plt.show()
In [354]:
visualise_pixel_distribution2(X_train_small, index1=0, index2=1)
Things Observed
    - First Images - Follow a normal distribution - Second Images - has two picks here

Mean Pixel Distribution¶

Checking the mean pixke and standard deviation

Whole dataset¶

In [355]:
mean, std = np.mean(X_train_small) ,  np.std(X_train_small)
print('Mean of Images:', mean)
print('Standard Deviation of Images:', mean)
Mean of Images: 114.36305
Standard Deviation of Images: 114.36305
In [356]:
def average_img(X_data):
    average_image = np.mean(X_data.astype(np.uint8), axis=0) / 255
    fig, ax = plt.subplots(figsize=(5, 5))
    ax.imshow(average_image, cmap='gray')
    ax.set_title('Average Image', fontsize=15, fontweight='bold')
    ax.axis('off')
    plt.show()
average_img(X_train_small)

By Class¶

In [357]:
def mean_class(X_data):
    fig, ax = plt.subplots(2, 5, figsize=(13, 8))
    fig.suptitle('Mean Pixel By Class', fontsize=15, fontweight='bold')
    for index, axs in enumerate(ax.ravel()):
      average = np.mean(X_data[np.squeeze(y_train == index)],axis=0)
      axs.imshow(average, cmap='gray')
      axs.set_title(f'Label: {labels_dict[index]}')
      axs.axis('off')
    plt.show()
In [358]:
mean_class(X_train_small)
Things Observed
    - Overall, the mean image is not as clear here

__Exploratory Data Analysis (128 x 128 Image)__¶


Here is to import the dataset and proceed to do analysis on it

In [359]:
data_big = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/train'  ,
                                                   color_mode='rgb',
                                               image_size=(128,128))
data_big
Found 9028 files belonging to 15 classes.
Out[359]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [360]:
X_train_big = []
y_train_big = []


for images, labels in tqdm(data_big):
    images = tf.image.rgb_to_grayscale(images)
    X_train_big.append(images)
    y_train_big.append(labels)

X_train_big = np.concatenate(X_train_big, axis=0)
X_train_big = np.squeeze(X_train_big, axis=-1)
y_train_big = np.concatenate(y_train_big, axis=0)
100%|██████████| 283/283 [00:01<00:00, 211.25it/s]
In [361]:
val_data_big = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/validation'  ,
                                                   color_mode='rgb',
                                               image_size=(128,128))
val_data_big
Found 3000 files belonging to 15 classes.
Out[361]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [362]:
X_val_big = []
y_val_big = []


for images, labels in tqdm(val_data_big):
    images = tf.image.rgb_to_grayscale(images)
    X_val_big.append(images)
    y_val_big.append(labels)

X_val_big = np.concatenate(X_val_big, axis=0)
X_val_big = np.squeeze(X_val_big, axis=-1)
y_val_big = np.concatenate(y_val_big, axis=0)
100%|██████████| 94/94 [00:00<00:00, 187.41it/s]
In [363]:
test_data_big = tf.keras.utils.image_dataset_from_directory('Dataset for CA1 part A/test'  ,
                                                   color_mode='rgb',
                                               image_size=(128,128))
test_data_big
Found 3000 files belonging to 15 classes.
Out[363]:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>
In [364]:
X_test_big = []
y_test_big = []


for images, labels in tqdm(test_data_big):
    images = tf.image.rgb_to_grayscale(images)
    X_test_big.append(images)
    y_test_big.append(labels)

X_test_big = np.concatenate(X_test_big, axis=0)
X_test_big = np.squeeze(X_test_big, axis=-1)
y_test_big = np.concatenate(y_test_big, axis=0)
100%|██████████| 94/94 [00:00<00:00, 185.17it/s]

Random Image Visualisation¶

  • Done on the bigger dataset
In [365]:
random_image_visualization(X_train_big, y_train_big, labels_dict, new=True, size='(128 x 128 Images)')

35 Image visualisation¶

In [366]:
visualize_first_images(X_train_big, y_train_big, labels_dict, new=True, size='(128 x 128 Images)')
Things Observed
    - Overall, smaller 31 x31 dataset are of clear and good quality

Pixel Distribution¶

In [367]:
visualise_pixel_distribution2(X_train_big, index1=0, index2=1)
Things Observed
    - First Images - First peak at the lower end of the intensity scale (dark pixels), meaning that a significant portion of the image contains dark areas. - Ssecond peak is around the mid-range of the intensity scale,showing thatg another substantial part of the image has medium brightne - Second Images - First peak at the lower end, indicating the presence of dark pixels, not as huge as second one - Secondr peak is towards the higher end of the intensity scale, suggesting a large number of bright pixel l>

__Overall__¶


The above exploratory data analysis is done on original, 31 x 31 and 128 x 128, next we can start developing our model